Benjamin M. Gyori, Harvard Medical School, 7/29/2018
%matplotlib inline
import json
import numpy
import pandas
import matplotlib.pyplot as plt
from IPython.display import Image
plt.rcParams['font.size'] = 18
plt.rcParams['figure.figsize'] = [10, 8]
from collections import Counter
from pysb.simulator import ScipyOdeSimulator
def df_from_counter(c, idx): return pandas.DataFrame.from_dict(cnt, orient='index').reset_index().rename(columns={'index': idx, 0:'count'})
Image('demo/indra_concept_detailed.png')
Image('demo/indra_apps.png')
from indra.sources import eidos, hume, sofia, cwms
from indra.assemblers import GraphAssembler, CAGAssembler, PysbAssembler
from indra.statements import Influence, Concept
We first look at a simple example to the text to model assembly pipeline. The use case here assumes an expert who aims to rapidly prototype a model based on assumptions they describe in English language.
Image('demo/nl_modeling.jpg', retina=True)
Image('demo/model_mouth.png', retina=True)
text = """A significant increase in precipitation resulted in food insecurity and
a decrease in humanitarian interventions. Actually, food insecurity itself can
lead to conflict, and in turn, conflict can drive food insecurity.
Generally, humanitarian interventions reduce conflict."""
Image('demo/eidos.png')
eidos_processor = eidos.process_text(text, webservice='http://localhost:5000')
eidos_processor.statements
Image('demo/influence.png')
ga = GraphAssembler(eidos_processor.statements)
ga.make_model()
ga.save_pdf('text_to_model.png')
Image('text_to_model.png', width=800)
def assemble_pysb(stmts, reverse_effects=True):
pa = PysbAssembler()
pa.add_statements(stmts)
model = pa.make_model(reverse_effects=reverse_effects)
return model
def simulate_model(model, ts=None, pd=None):
if not ts:
ts = numpy.linspace(0, 1000, 100)
sim = ScipyOdeSimulator(model, ts)
if not pd:
res = sim.run()
else:
res = sim.run(param_values=pd)
df = res.dataframe
df = df.rename(columns={'__s%d' % i: s.monomer_patterns[0].monomer.name
for i, s in enumerate(model.species)})
return df
model = assemble_pysb(eidos_processor.statements, True)
df = simulate_model(model, ts=None, pd={'kf_f_deg_1': 1e-4})
df.plot()
Both conflict and food insecturity now decrease.
df = simulate_model(model, ts=None, pd={'humanitarian_interventions_0': 1e6, 'precipitation_0': 1e3,
'kf_f_deg_1': 1e-3})
df.plot(ylim=[0, 15000])
text2 = text + ' Displacement causes food insecurity.'
df = simulate_model(assemble_pysb(eidos.process_text(text2, webservice='http://localhost:5000').statements),
ts=None, pd={'humanitarian_interventions_0': 1e6, 'precipitation_0': 1e3,
'kf_f_deg_1': 1e-3})
df.plot(ylim=[0, 15000])
Image('demo/eval_doc_text_model.png')
Image('demo/demo_arch_pimtg.png')
from run_eval import *
# Get the IDs of all the documents in the docs folder
docnames = sorted(['.'.join(os.path.basename(f).split('.')[:-1])
for f in glob.glob('docs/*.txt')],
key=lambda x: int(x.split('_')[0]))
exclude = '31_South_Sudan_2018_Humanitarian_Needs_Overview'
docnames = [d for d in docnames if d != exclude]
print('Using %d documents' % len(docnames))
On another note: indra.literature.elsevier_client and indra.literature.newsapi_client can be used to query for and collect reading corpora on demand (e.g. we collected Elsevier-published papers discussing "food security + intervention")
eidos_stmts = read_eidos(docnames)
cwms_stmts = read_cwms_sentences(extract_eidos_text(docnames), read=False)
hume_stmts = read_hume('bbn/wm_m6_0628.json-ld')
sofia_stmts = read_sofia('sofia/MITRE_June18_v1.xlsx')
We simply concatenate Statements from different sources - power of common knowledge representation!
statements = eidos_stmts + cwms_stmts + hume_stmts + sofia_stmts
print('%d total statements' % len(statements))
Let's plot how many raw Statements we get from each source
source_cnt = sorted(Counter([st.evidence[0].source_api for st in statements]).items(),
key=lambda x: x[1], reverse=True)
plt.bar(*zip(*source_cnt))
Statements can be inspected manually / programmatically, and also serve as an exchange format
eidos_stmts[0]
eidos_stmts[0].evidence[0].text
eidos_stmts[0].subj.db_refs['UN']
print(json.dumps(eidos_stmts[0].to_json(), indent=1))
Let's now start assembly for real! First, we map grounding between ontologies.
om = ontology_mapper.OntologyMapper(statements, ontology_mapper.wm_ontomap,
symmetric=False, scored=True)
om.map_statements()
statements = om.statements
Image('demo/ontomap.png')
Let's look at a specific example
statements[261].subj.db_refs
We see here that the original string "food security" is grounded to the UN, HUME, SOFIA and CWMS ontologies simultaneously.
We tabulate each Concept by the ontologies it is grounded to. This shows that the majority of Concepts are now grounded to multiple ontologies, with 2500+ grounded to at least 3 ontologies.
cnt = Counter([tuple(sorted(list(set(conc.db_refs.keys())-{'TEXT', 'FAO', 'WDI'})))
for stmt in statements for conc in stmt.agent_list()])
df_from_counter(cnt, 'Grounding')
Next, let's filter out Statements with unreliably grounded or ungrounded concepts
statements = ac.filter_grounded_only(statements, score_threshold=0.7)
Now that we mapped ontologies, let's filter Statements by relevance with respect ot a list of UN terms.
Image('demo/prior_model.png', retina=True)
statements = ac.filter_by_db_refs(statements, 'UN',
['conflict', 'food_security', 'food_insecurity', 'flooding', 'food_production',
'human_migration', 'drought', 'food_availability', 'market',
'precipitation'], policy='all',
match_suffix=True)
We make an assembly assumption in which we assume implicit positive polarity for influences where the subject polarity is not explicit. We then filter out any remaining Statements that don't have both subject and object polarity set.
assume_polarity(statements)
statements = filter_has_polarity(statements)
Next, we run "preasembly" on the set of Statements. We first construct a joint hierarchy of all the reader ontologies.
def get_joint_hierarchies():
eidos_ont = os.path.join(os.path.abspath(eidos.__path__[0]),
'eidos_ontology.rdf')
trips_ont = os.path.join(os.path.abspath(cwms.__path__[0]),
'trips_ontology.rdf')
hume_ont = os.path.join(os.path.abspath(hume.__path__[0]),
'hume_ontology.rdf')
hm = HierarchyManager(eidos_ont, True, True)
hm.extend_with(trips_ont)
hm.extend_with(hume_ont)
hierarchies = {'entity': hm}
return hierarchies
hierarchies = get_joint_hierarchies()
The code below is usually a one-liner but here we break it up into multiple parts to show exactly what happens.
# Combine duplicates with the Preassembler
pa = Preassembler(hierarchies, statements)
unique_stmts = pa.combine_duplicates()
print('%d unique statements' % len(unique_stmts))
# Run the BeliefEngine with prior probabilities
be = BeliefEngine()
be.set_prior_probs(unique_stmts)
# Combine hierarchically-related Statements with the Preassembler
related_stmts = pa.combine_related(return_toplevel=False)
# Propagate beliefs over the hierarchy graph
be.set_hierarchy_probs(related_stmts)
We can now filter for belief if we want to and get only the top-level Statements in the hierarchy
#statements = ac.filter_belief(related_stmts, 0.8)
top_stmts = ac.filter_top_level(related_stmts)
print('%d top-level statements' % len(top_stmts))
↑ This set of Statements is the input to Demo 2
The sources of individual pieces of evidence that contribute to Statements overall:
render_stmt_graph(statements).draw('demo/stmt_graph.png', prog='dot')
cnt = Counter([ev.source_api for stmt in top_stmts for ev in stmt.evidence])
df_from_counter(cnt, 'Source')
cnt = Counter([tuple(sorted(list({ev.source_api for ev in stmt.evidence}))) for stmt in top_stmts])
df_from_counter(cnt, 'Sources')
Let's pick one Statement and look at its evidence sentences in detail
stmt = [s for s in top_stmts if 'conflict' in s.subj.db_refs['UN'][0][0] and 'migration' in s.obj.db_refs['UN'][0][0]][0]
print('======')
print('source: %s' % stmt.subj.db_refs['UN'][0][0])
print('target: %s' % stmt.obj.db_refs['UN'][0][0])
print('======')
for ev in sorted(stmt.evidence, key=lambda x: x.text):
print('%s: %s' % (ev.source_api.upper(), ev.text))
print('--')
We can also find contradictions
standardize_names(top_stmts)
pa.stmts = top_stmts
contradictions = pa.find_contradicts()
for c1, c2 in contradictions:
print('%s\n <-> %s\n' % (c1, c2))
How can we decide between alternatives?
top_stmts = remove_contradicts(top_stmts, contradictions)
model = assemble_pysb(top_stmts)
This model works and can be simulated but it's not parameterized! Demo 3 covered methods to automatically parameterize models via gradable adjectives and indicators.
res_df = simulate_model(model)
res_df[['Conflict', 'Food_insecurity', 'Food_production']]
We can also use INDRA's explanation module and the ModelChecker to find paths that satisfy a given set of overall Influences (with polarities).
from indra.explanation.model_checker import PysbModelChecker
to_check = Influence(Concept('Conflict'), Concept('Food_insecurity'))
mc = PysbModelChecker(model, statements=[to_check])
mc.prune_influence_map()
paths = mc.check_model(max_paths=10)
paths[0][1]
paths[0][1].paths[8]
↑ this is a powerful explanation finding tool:
"in what ways could increased precipitation have resulted in reduced food security?"
but for now, we can use it for debugging, i.e. "do the ways in which Precipitation affects Food insecurity in the model make sense?". In a lot of the cases, clearly, no, and these are easily identifiable in a problem-driven way here (i.e. focus on problems in the model that matter).
Finally, let's look at an integrated simulation of the model with Topoflow in a relevant context.
Image('demo/emeli_integ.png')
ac.dump_statements(top_stmts, 'indra_eval_stmts.pkl')
from indra.assemblers.bmi_wrapper import BMIModel
model.name = 'indra_eval_model'
bmi_model = BMIModel(model)
out_name_maps = {'atmosphere_water__rainfall_volume_flux':
'Precipitation'}
input_vars = ['Precipitation']
bm = BMIModel(model, inputs=input_vars, stop_time=10000,
outside_name_map=out_name_maps)
bmi_model.export_into_python()
Topoflow is configured with precipitation for the Gel-Aliab basin
Image('demo/rivers.jpg', retina=True)
We now use this configuration of Topoflow and simulate it together with the model we just assembled.
Image('demo/topoflow_simul2.png')